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HOUSEHOLD LEVEL SEGMENTATION METHOD AND SYSTEM 
BACKGROUND OF THE INVENTION 

Field of the Invention 

[0001] The invention relates to method and apparatus for population segmentation. In 
particular, the invention relates to a method and system of household-level 
segmentation. 



Related Art 

[0002] For marketing purposes, knowledge of customer behavior is important, if not crucial. 
For direct marketing, for example, it is desirable to focus the marketing on a portion of 
the segment likely to purchase the marketed product or service. 

[0003] In this regard, several methods have traditionally been used to divide the customer 
population into segments. The goal of such segmentation methods is to predict 
consumer behavior and classify consumers into clusters based on observable 
characteristics. Factors used to segment the population into clusters include 
demographic data such as age, marital status, and mcome and behavioral data such as 
tendency to purchase a particular product or service. 

[0004] In dividing the population into segments, it is desired to maximize the homogeneity 
within a cluster, while maximizing the distinctness across clusters. In this regard, 
traditional segmentation schema have employed a two-stage process involving targeted 
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optimization and cluster evaluation. These schema can begin either with behavior 
(behaviorally driven) or with demographics (demographically driven). 

[0005] Figure 1 illustrates a traditional, behaviorally driven segmentation process 100. At 

block 110, a set of clusters of households is defined based on common behaviors withm 
each cluster. The clusters are defined such that the behaviors withm each cluster are as 
similar as possible, while being as different as possible across clusters. At block 120, 
the clusters are evaluated for demographics to determine whether the demographics of 
each cluster are sufficiently similar within the cluster, while being sufficiently different 
across the clusters. At block 130, if the demographics do not satisfy the criteria, the 
process is repeated from block 110 until an optimal segmentation is achieved. 
Although this iterative method may resuU in a useful segmentation system, it fails to 
directly provide a solution that defines clusters based on demographics. 

[0006] Figure 2 illustrates a traditional, demographically driven segmentation process 200. At 
block 210, a set of clusters of households is defmed based on common demographics 
within each cluster. The clusters are defined such that the demographics within each 
cluster are as similar as possible, while being as different as possible across clusters. 
At block 220, the clusters are evaluated for behaviors to determine whether the 
behaviors of each cluster are sufficiently similar within the cluster, while being 
sufficiently different across the clusters. At block 230, if the behaviors do not satisfy 
the criteria, the process is repeated from block 210 until an optimal segmentation is 
achieved. Sunilarly to the system described above with reference to Figure 1, tiie 
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system of Figure 2 fails to directly provide a solution that defines clusters based on 
behavior. 

[0007] Thus, while these traditional, iterative methods may result in a useful segmentation 
system, they fail to directly provide a solution that defines clusters based jointly on 
behavior and demographics. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0008] In the following, the invention will be explained in further detail with reference to the 

drawings, in which: 
[0009] Fig. 1 is a flow chart illustrating a traditional segmentation method; 
[0010] Fig. 2 is a flow chart illustrating another traditional segmentation method; 
[0011] Fig. 3 is an example of a classification tree; 
[0012] Fig. 4 is another example of a classification tree; 

[0013] Fig. 5 is a pictorial illustration of a segmentation system accorduig to one embodunent 
of the invention; and 

[0014] Fig. 6 is a pictorial illustration of a segmentation system according to another 
embodiment of the invention. 

DESCRIPTION OF CERTAIN EMBODIMENTS OF THE INVENTION 
[0015] The present invention provides a segmentation system for classifying households into 
market segments that can be used to describe, target and measure consumers by their 
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demand for and use of particular products and services. The segments are optimized to 
provide high-lift profiles for the evaluation profiles. 

[0016] One embodunent of the invention provides a method for classifying consumers in 

clusters comprising generating a plurality of classification trees based on demographic 
data for a set of consumers and behavioral data for a set of consumers, each of the 
classification trees producing a consumer cluster set, searching the consumer cluster 
sets for an optmial consumer cluster set, the optimal consumer cluster set having a 
plurality of clusters of consumers. Consumers in each cluster of the plurality of 
clusters have substantially sunilar behavioral and demographic characteristics to each 
other and different behavioral or demographic characteristics from consumers in all 
other clusters of the plurality of clusters. 

[0017] In a preferred embodiment, consumers m each cluster have different demographic 
characteristics from consumers in all other clusters of the plurality of clusters. 

[0018] The segmentation system according to one embodiment of the present invention 

employs a partitioning program that optimizes a segmentation based on both behavioral 
and demographic factors. A classification tree methodology is used and all possible 
combinations of input variables are searched to identify an optimal combination which 
best predicts a targeted set of consumer behaviors. The classification tree methodology 
results in a set of terminal nodes. 

[0019] Figures 3 and 4 illustrate examples of classification trees. Referring first to Figure 3, a 
population at Node 1 is split based on Decision 1 into populations at Node 2 and Node 
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3. These populations are further split according to additional decisions until terminal 
nodes (shown in rectangular blocks), Nodes 6, 7, 8, 9, 10, 12 and 13, are reached. 
The terminal nodes represent clusters determined by the segmentation system. 
[00201 In Figure 4, the same population in Node 1 may be split into populations at Node 2 and 
Node 3 based on a different decision, Decision 5 for example, than used to split Node 1 
in Figure 3. Similarly, further decisions are used to split the populations until terminal 
nodes, Nodes 4, 5, 8, 10, 11, 12 and 13, are reached. Thus, the partitioning program 
of the present invention searches all possible classification trees to determine an optimal 
combination. 

[00211 For an optimal combination, each terminal node represents a segment that is 

homogeneous in both behavior and demographics. One example of a classification tree 
methodology is disclosed in "Classification Trees for Multiple Bmary Responses" by 
Heping Zhang, Journal of the American Statistical Association, March 1998, which is 
hereby incorporated by reference. The classification tree methodology described 
therein is hereinafter referred to as "Zhang's methodology". 

[0022] In one embodiment of a segmentation system according to the present mvention, the 
program searches for a combination that optimizes a measure of behavior and 
demographic data. For example, for all possible splits in the classification tree, the 
program selects the split that maximizes: 

LFract^,^ x RFract^ x TFract^ x YiLPen^^^^ - RPen^^^-^} 

p 
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LFractyds ^ LCount^ds ^ TCountg 
RFractyds ^ LCounty^s ^ TCountg 

TFracts ^ TCounts -i- Total population over all segments (S). 

LPenp^^^s^ = For a given profile p within a split of segment s, dimension d, by 

value V, count of Profile in the left split p^^^^s)^ Count of base in the 

left split p(,,,). 

RPenp^^ds) - F^>r ^ given profile p within a split of segment s, dimension d, by 
value V, count of Profile in the right split p^^^s)"^ Count of base in the 

right split pf,ds)- 
{S} = The set of segments being evaluated. 

s = A specific element of {S}. 

{D} ^ The set of dimensions being evaluated. 

d = A specific element of {D}. 

{V} = The set of values being evaluated. The set of values may be nested 

within the a particular dimension (d) and segment (s). 

v = A specific element (value) of {V}. 

{P} = set of profiles in use. 

p ^ A specific element of {?}. 
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LCount^ds = For ^ given split of segment s, dimension d, by value v, the count 

of "population" in the "left" split. 
RCountvds = For a given split of segment s, dimension d, by value v, the count 

of "population" in the "right" split. 
TCounts = For a given segment s, the count of "population" contained in the 

segment prior to being split. 
[0023] This notation may be extended as follows: 

TPenp(s) = For a given profile p and segment s, prior to the proposed split on 

dimension d, by value v, count of Profile in the segmentp^^,^ count of 

base in segmentp,^). 

RightGinip^,,,^ = 2 x RFract,^ x RPen^^,^^ x (l - RPen^^,,,^ ) 

TopGini^^,,,^ = 2 x TPen^^^^^ x (l - TPen^^^^^ ) 

AGm/^(,,,) = TFract^^x(TopGini^^,,,y-RightGini^^,,,y -LeftGini^^,^,^) 

The split is accordingly chosen to maximize the change in the "Gini" unpurity measure: 

p 

[0024] Figure 5 shows a schematic illustration of a segmentation system 500 according to one 
embodiment of the invention. The system comprises a primary partitioning module 
510. The primary partitioning module 510 defines segments by using a specified set of 
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variables or dimensions. The definition of the segments is dictated by the ability of the 
segments to create high lift profiles for the evaluation profiles. The primary 
partitioning module 510 may use a greedy algorithm to sequentially split the data into 
partitions that at each point create local maxima. The partitioning module 510 may also 
be used to output the definition of the segments or the assignments themselves. In one 
embodunent, the primary partitioning module 510 is a program written in Borland 
Delphi 5. 

[0025] The partitioning module 510 communicates with a profile definitions module 520, 
which may be implemented as a database. The profile definitions module 520 may 
define profiles, then- bases and whether they should be used. The profile definitions 
module 520 may also contain data for defining evaluation profiles, their bases, their 
classification, and for indicating those which should be used in the evaluation analysis. 
This data is provided to the partitioning module 510 for optunization of the 
segmentation. Additional data may be contained in the profile definitions module 520 
to keep track of the performance of any models created, the rules for creating the 
models and compare thek performance. In one embodunent, the profile definitions 
module 520 comprises a Microsoft Access database. 

[0026] The segmentation system 500 fiirther comprises a profile data module 530. The profile 
data module 530 contains profile data (summaries of counts). The primary partitioning 
module 510 uses this data for assessment of the segmentation. In one embodiment, the 
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profile data module 530 is a file comprising records with as many colnnms as there are 
profiles. 

[0027] A segment definitions module 540 is provided in communication with the primary 

partitioning module 510. The segment definitions module 540 may be implemented as 
a dBase file containing one record per geocode for providing this data to the primary 
partitioning module 510. The primary partitioning module 510 uses this data to define 
the segments. The file may comprise a predetermined number of segmenting variables. 

[0028] The segmentation system 500 also comprises a cluster assignments module 550. The 
cluster assignments module 550 may be implemented as a dBase table containing one 
record per geocode. The cluster assignments module 550 contains the assignments of 
the clusters which are updated by the primary partitioning program 510 pursuant to 
optimization based on data received from the profile definitions module 520, the profile 
data module 530 and the segment definitions module 540. 

[0029] Thus, the primary partitioning module 510 may execute a program using data from the 
profile definitions module 520, the profile data module 530 and the segment definitions 
module 540. The program may perform optimization using classification trees as 
described above to output optimal cluster assigimients to the cluster assignments module 
550. 

[0030] Figure 6 illustrates a segmentation system 600 according to another embodiment of the 
invention. Similarly to the segmentation system 500 of Figure 5, segmentation system 
600 of Figure 6 comprises a partitioning module 610 which uses data fi:om a profile 
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definitions module 620, a profile data module 630 and a segment definitions module 
640 to output optimal cluster assignments to a cluster assignments module 650. 

[0031] The segmentation system 600 further comprises a summary module 660, into which 
data from the cluster assignments module 650 is input. In one embodiment, the 
summary module 660 is implemented as a software program written in Borland Delphi 
5. The summary module 660 generates model performance statistics and outputs them 
to a summary data module 670. The summary data module 670 may be hnplemented as 
a Microsoft Access database. 

[0032] In one embodiment, the segmentation system is capable of accommodating up to 250 
profiles for evaluating performance, 16,000 records, 20 variable for defimng the 
segments and 80 created segments. With these limits, the system may be implemented 
on a computer system using Microsoft Windows NT operating system, for example, 
requking approxunately 45 MB of memory. The summarization module may be 
implemented on such a system within 5 MB of additional memory and providing up to 
100 segments and 999 profiles. 

[0033] The segmentation systems described above may be used as follows. First, the variables 
to be used for segmentation are defined m the segment definitions module. A program 
that can generate dbase tables, such as SPSS, may be used. 

[0034] Next, the segments are created by the partitioning module. Data may first be loaded 
into the module by making selections through a menu-type user interface. Program 
mformation may be provided in a message window on the display. Data in the profile 
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definitions module controls which profiles are actually used to evaluate the partitioning 
within the program. A "select" field within the data may be used for this purpose. 
The program may, by defauh, load and use only those variables whose "select" value is 
"0" (used to indicate a base that is always loaded) or "1", The "select" field set to be 
loaded may be changed using the menu-type user interface. Several sets of profiles 
may be created. The user can then assess over-specification in the model by comparing 
the performance between the set of profiles used ui the program for assessment and the 
remaining sets. 

[0035] After loading the data, the profile section of the display may contain the list of all 

evaluation profiles that have been loaded. An "InUse" field may indicate whether the 
profile is currently being used to evaluate the partitioning. Note that profiles that are 
bases and profiles that have low counts may be turned off. The user can control the 
low count Imiit via the user interface. A profile may be manually turned on/off by 
modifying the "InUse" field. 

[0036] The executed splits may be listed in a "Split Views" area of the display. This view 

may show the splits that have been made in the order that they have been applied. The 
information may show the split number, the dimension that was used in the split, its 
value (all splits are made as < == value versus > value), and the row (or segment) that 
was split. Note that in this view, selecting a split sets the "active model" to this point. 
Information that is characteristic of the model may be presented on the display. 
Further, on entering the "splits" window, this may be the point from which further 
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segmentation wiU begin. Thus, for example, selecting split "1-None" effectively resets 
a model to the beginning. A "performance summary" view may show the performance 
statistics for the models as well as graphical information. A "dun by value" view may 
show the dimensions and values used in the splits in summarized form. Fully 
collapsed, the dimension and number of times it was used may be viewed. A level may 
be expanded to show the split values and the number of occurrences of that value. 
Final expansion may show the actual splits. 
[0037] The tab section may have five sections available. Selecting a "profile and segment 

statistics" tab may provide either a view of specific profiles or general characteristics of 
the generated segments. The user may control the profile presented by selecting 
various profiles from the profile list and control the level of the model displayed from 
the split views control. A "model performance" tab may show a graph of the model 
performance in split order. A "split hierarchy display" tab may show the splits in a 
traditional hierarchical form. A "row dimension data" tab may show input data. 
Another tab, "Session Model History" , may provide information on the models 
generated in the current session. A model may be stored on this page each time a split 
is executed at a higher level than the existing model. For example after creating 15 
segments, a user may select split 7 and create a new and different split at this stage. 
The previous 15-cell model will then be stored. From this page, the new "current" 
model (from the splits table) may be compared to previous models. A previous set of 
splits may also be restored from this page. 
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[0038] To Start splitting the data, a meclianisms may be selected within the "split" window. In 
one embodiment, three such mechanisms are offered. The user may request the 
program to suggest a split by click a "Find best split" button. This will cause the 
program to look for the "best" split currently available. The recommended splits will 
appear in the proposed "splits list" in order of their relative lift. The recommended 
split may be executed by clicking a "Make Split" button. The user can select a 
different split by, for example, double-clicking on the desired alternate split. This 
action will change the split shown m the first row that is labeled "order 0" . As a 
second option, the user can manually force a split by usuig dropdown controls. A 
specific dimension, a specific row (or segment), and a specific value may be selected. 
Only valid splits may be displayed. The third option is to let the program make a 
specific number of splits on its own. Enter the number of splits desired mto an 
"iterations" box and click a "Split X Times" button. The program will stop when 
either the desired number of splits have been made or no further valid splits are 
available. 

[0039] Two values may be used to control the valid splits. A minimum segment size may 
control the minimum population base required. No splits will be created below this 
threshold. This minimum value may be defined and altered by the user. Further, the 
program will not make a split more unbalanced than the value hidicated in a "min split 
fraction" box. 
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[0040] The options menu item on the "Splits" form may provide the mechanism to select either 
"twoing" or "gini" as the measure used to evaluate candidate splits. This menu may 
also provide the option of using recursion. Recursion should only be used with the 
"gini" criterion. Recursion, as it sounds, will for each possible split, evaluate all next 
level possible splits before making a decision on a specific split. 

[00411 Next, the model may be written out in one of several ways. In one embodiment, there 
are four options on the file menu that assist in dealing with a model, "Split Vars Used" 
may display the dimensions available and the number of times splits made with those 
dimensions. "Show Definition" may create SPSS code, for example, to make the 
assignments using the current active model and may place text on the message form. 
"Show limits" may place an obscure definition of the splits in the message form. 
"Dump assignments" may update data in the cluster assignments module with the 
assignments from the active model. 

[00421 Next, the model performance over all available profiles may be summarized via the 
summarization module. The module may read the segment assignments from the 
cluster assignments module, match them against binary data in the profile data module, 
and summarize the profiles. The summarization module may use data from the profile 
definitions module to define the bases and location of the information. The 
summarization module may summarize all profiles available in the binary data set. The 
summarization module may also create the summary data module. 
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While particular embodiments of the present invention have been disclosed, it is to be 
understood that various different modifications and combinations are possible and are 
contemplated within the true spirit and scope of the appended claims. There is no 
mtention, therefore, of limitations to the exact abstract or disclosure herein presented. 



15 



